4 research outputs found

    El Laboratorio de Virtualización 3D de Idaho

    Full text link
    [EN] Three dimensional (3D) virtualization and visualization is an important component of industry, art, museum curation and cultural heritage, yet the step by step process of 3D virtualization has been little discussed. Here we review the Idaho Virtualization Laboratory’s (IVL) process of virtualizing a cultural heritage item (artifact) from start to finish. Each step is thoroughly explained and illustrated including how the object and its metadata are digitally preserved and ultimately distributed to the world.[ES] La virtualización y visualización tridimensional (3D) es un componente importante de la industria, el arte, los museos y el patrimonio cultural, sin embargo, el proceso paso a paso de virtualización 3D se ha discutido muy poco. Aquí repasamos de principio a fin el proceso de virtualizacion de un elemento del patrimonio cultural (artefacto) llevado a cabo por el Laboratorio de Virtualización de Idaho (IVL). Cada paso es explicado e ilustrado completamente incluyendo cómo el objeto y sus metadatos son preservados digitalmente y en última instancia, distribuidos en el mundo.The authors would like to thank the National Science Foundation (awards ARC- 0808933, 102332, 1237452 and 1321411), the Hitz Foundation, the M. J. Murdock Charitable Trust, Idaho State University, the ISU Informatics Research Institute, and the Idaho Museum of Natural History for supporting this research. The National Science Foundation, nor any other funding source, is responsible for the advancements, conclusions, or implications of this work.Holmer, NA.; Clement, N.; Dehart, K.; Maschner, H.; Pruitt, J.; Schlader, R.; Van Walsum, M. (2014). The Idaho Virtualization Laboratory 3D Pipeline. Virtual Archaeology Review. 5(10):21-31. https://doi.org/10.4995/var.2014.4208OJS2131510BETTS, M. W., MASCHNER, H. D. G., SCHOU, C. D., SCHLADER, R., HOLMES, J., CLEMENT, N., SMUIN, M. (2011): "Virtual zooarchaeology: building a web-based reference collection of northern vertebrates for archaeofaunal research and education", in Journal of Archaeological Science, Volume: 38, Issue: 4, pp. 755-762.MASCHNER, H. (2013): "Democracy in 3D", in Museum, pp. 26-31.MASCHNER, H., SCHOU, C., HOLMES, J. (2013): "Virtualization and the democratization of science: 3D technologies revolutionize museum research and Access", in Proceedings of the 2013 World Digital Heritage Conference. -1-4799-3169-9/13©2013 IEEE. http://dx.doi.org/10.1109/DigitalHeritage.2013.6744763TAPANILA, L., PRUITT, J., PRADEL, A., WILGA, C. D., RAMSAY, J. B., SCHLADER, R., DIDIER, D. A. (2013): "Jaws for a spiral-tooth whorl: CT images reveal novel adaptation and phylogeny in fossil Helicoprion", in Biology Letters, 9, 20130057 http://dx.doi.org/10.1098/rsbl.2013.005

    Towards a scientific workflow featuring Natural Language Processing for the digitisation of natural history collections [Version 1]

    Get PDF
    We describe an effective approach to automated text digitisation with respect to natural history specimen labels. These labels contain much useful data about the specimen including its collector, country of origin, and collection date. Our approach to automatically extracting these data takes the form of a pipeline. Recommendations are made for the pipeline's component parts based on some of the state-of-the-art technologies.Optical Character Recognition (OCR) can be used to digitise text on images of specimens. However, recognising text quickly and accurately from these images can be a challenge for OCR. We show that OCR performance can be improved by prior segmentation of specimen images into their component parts. This ensures that only text-bearing labels are submitted for OCR processing as opposed to whole specimen images, which inevitably contain non-textual information that may lead to false positive readings. In our testing Tesseract OCR version 4.0.0 offers promising text recognition accuracy with segmented images.Not all the text on specimen labels is printed. Handwritten text varies much more and does not conform to standard shapes and sizes of individual characters, which poses an additional challenge for OCR. Recently, deep learning has allowed for significant advances in this area. Google's Cloud Vision, which is based on deep learning, is trained on large-scale datasets, and is shown to be quite adept at this task. This may take us some way towards negating the need for humans to routinely transcribe handwritten text.Determining the countries and collectors of specimens has been the goal of previous automated text digitisation research activities. Our approach also focuses on these two pieces of information. An area of Natural Language Processing (NLP) known as Named Entity Recognition (NER) has matured enough to semi-automate this task. Our experiments demonstrated that existing approaches can accurately recognise location and person names within the text extracted from segmented images via Tesseract version 4.0.0. Potentially, NER could be used in conjunction with other online services, such as those of the Biodiversity Heritage Library to map the named entities to entities in the biodiversity literature (https://www.biodiversitylibrary.org/docs/api3.html).We have highlighted the main recommendations for potential pipeline components. The document also provides guidance on selecting appropriate software solutions. These include automatic language identification, terminology extraction, and integrating all pipeline components into a scientific workflow to automate the overall digitisation process

    Conceptual design blueprint for the DiSSCo digitization infrastructure - DELIVERABLE D8.1

    Get PDF
    DiSSCo, the Distributed System of Scientific Collections, is a pan-European Research Infrastructure (RI) mobilising, unifying bio- and geo-diversity information connected to the specimens held in natural science collections and delivering it to scientific communities and beyond. Bringing together 120 institutions across 21 countries and combining earlier investments in data interoperability practices with technological advancements in digitisation, cloud services and semantic linking, DiSSCo makes the data from natural science collections available as one virtual data cloud, connected with data emerging from new techniques and not already linked to specimens. These new data include DNA barcodes, whole genome sequences, proteomics and metabolomics data, chemical data, trait data, and imaging data (Computer-assisted Tomography (CT), Synchrotron, etc.), to name but a few; and will lead to a wide range of end-user services that begins with finding, accessing, using and improving data. DiSSCo will deliver the diagnostic information required for novel approaches and new services that will transform the landscape of what is possible in ways that are hard to imagine today. With approximately 1.5 billion objects to be digitised, bringing natural science collections to the information age is expected to result in many tens of petabytes of new data over the next decades, used on average by 5,000 – 15,000 unique users every day. This requires new skills, clear policies and robust procedures and new technologies to create, work with and manage large digital datasets over their entire research data lifecycle, including their long-term storage and preservation and open access. Such processes and procedures must match and be derived from the latest thinking in open science and data management, realising the core principles of 'findable, accessible, interoperable and reusable' (FAIR). Synthesised from results of the ICEDIG project ('Innovation and Consolidation for Large Scale Digitisation of Natural Heritage', EU Horizon 2020 grant agreement No. 777483) the DiSSCo Conceptual Design Blueprint covers the organisational arrangements, processes and practices, the architecture, tools and technologies, culture, skills and capacity building and governance and business model proposals for constructing the digitisation infrastructure of DiSSCo. In this context, the digitisation infrastructure of DiSSCo must be interpreted as that infrastructure (machinery, processing, procedures, personnel, organisation) offering Europe-wide capabilities for mass digitisation and digitisation-on-demand, and for the subsequent management (i.e., curation, publication, processing) and use of the resulting data. The blueprint constitutes the essential background needed to continue work to raise the overall maturity of the DiSSCo Programme across multiple dimensions (organisational, technical, scientific, data, financial) to achieve readiness to begin construction. Today, collection digitisation efforts have reached most collection-holding institutions across Europe. Much of the leadership and many of the people involved in digitisation and working with digital collections wish to take steps forward and expand the efforts to benefit further from the already noticeable positive effects. The collective results of examining technical, financial, policy and governance aspects show the way forward to operating a large distributed initiative i.e., the Distributed System of Scientific Collections (DiSSCo) for natural science collections across Europe. Ample examples, opportunities and need for innovation and consolidation for large scale digitisation of natural heritage have been described. The blueprint makes one hundred and four (104) recommendations to be considered by other elements of the DiSSCo Programme of linked projects (i.e., SYNTHESYS+, COST MOBILISE, DiSSCo Prepare, and others to follow) and the DiSSCo Programme leadership as the journey towards organisational, technical, scientific, data and financial readiness continues. Nevertheless, significant obstacles must be overcome as a matter of priority if DiSSCo is to move beyond its Design and Preparatory Phases during 2024. Specifically, these include: Organisational: Strengthen common purpose by adopting a common framework for policy harmonisation and capacity enhancement across broad areas, especially in respect of digitisation strategy and prioritisation, digitisation processes and techniques, data and digital media publication and open access, protection of and access to sensitive data, and administration of access and benefit sharing. Pursue the joint ventures and other relationships necessary to the successful delivery of the DiSSCo mission, especially ventures with GBIF and other international and regional digitisation and data aggregation organisations, in the context of infrastructure policy frameworks, such as EOSC. Proceed with the explicit aim of avoiding divergences of approach in global natural science collections data management and research. Technical: Adopt and enhance the DiSSCo Digital Specimen Architecture and, specifically as a matter of urgency, establish the persistent identifier scheme to be used by DiSSCo and (ideally) other comparable regional initiatives. Establish (software) engineering development and (infrastructure) operations team and direction essential to the delivery of services and functionalities expected from DiSSCo such that earnest engineering can lead to an early start of DiSSCo operations. Scientific: Establish a common digital research agenda leveraging Digital (extended) Specimens as anchoring points for all specimen-associated and -derived information, demonstrating to research institutions and policy/decision-makers the new possibilities, opportunities and value of participating in the DiSSCo research infrastructure. Data: Adopt the FAIR Digital Object Framework and the International Image Interoperability Framework as the low entropy means to achieving uniform access to rich data (image and non-image) that is findable, accessible, interoperable and reusable (FAIR). Develop and promote best practice approaches towards achieving the best digitisation results in terms of quality (best, according to agreed minimum information and other specifications), time (highest throughput, fast), and cost (lowest, minimal per specimen). Financial Broaden attractiveness (i.e., improve bankability) of DiSSCo as an infrastructure to invest in. Plan for finding ways to bridge the funding gap to avoid disruptions in the critical funding path that risks interrupting core operations; especially when the gap opens between the end of preparations and beginning of implementation due to unsolved political difficulties. Strategically, it is vital to balance the multiple factors addressed by the blueprint against one another to achieve the desired goals of the DiSSCo programme. Decisions cannot be taken on one aspect alone without considering other aspects, and here the various governance structures of DiSSCo (General Assembly, advisory boards, and stakeholder forums) play a critical role over the coming years
    corecore